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ON THE FAST CONVERGENCE OF AN INERTIAL GRADIENT-LIKE DYNAMICS WITH 

VANISHING VISCOSITY 

HEDY ATTOUCH, JUAN PEYPOUQUET, AND PATRICK REDONT 


Abstract. In a real Hilbert space we study the fast convergence properties as t —+oo of the trajectories of the 
second-order evolution equation 

x{t) + V$(x(t)) = 0 , 

where is the gradient of a convex continuously differentiable function $ : 'K —>■ R, and a is a positive parameter. 
In this inertial system, the viscous damping coefficient j vanishes asymptotically in a moderate way. For a > 3, 
we show that any trajectory converges weakly to a minimizer of just assuming that argmin‘I> 7 ^ 0. The strong 
convergence is established in various practical situations. These results complement the rate of convergence for 

the values obtained by Su, Boyd and Candes. Time discretization of this system, and some of its variants, provides 
new fast converging algorithms, expanding the field of rapid methods for structured convex minimization introduced by 
Nesterov, and further developed by Beck and Teboulle. This study also complements recent advances due to Chambolle 
and Dossal. 


1. Introduction 

Let be a real Hilbert space, which is endowed with the scalar product (*, •) and norm 11*11, and let $ : 7/ —)■ M be 
a (smooth) convex function. In this paper, we study the solution trajectories of the second-order differential equation 

( 1 ) x{t) + ^x{t)+V^{x{t)) = 0 , 

with a > 0, in terms of their asymptotic behavior as t —>■ -l-oo. Although this is not our main concern, we point out 
that, given tg > 0, for any Xq € H, vq € H, the existence of a unique global solution on [to,+oo[ for the Cauchy 
problem with initial condition x{to) = Xq and x(to) = vq can be guaranteed, for instance, if V<i> is Lipschitz-continuous 
on bounded sets. 

The importance of this evolution system is threefold: 

1. Mechanical interpretation: It describes the position of a particle subject to a potential energy function and 
an isotropic linear damping with a viscosity parameter that vanishes asymptotically. This provides a simple model for 
a progressive reduction of the friction, possibly due to material fatigue. 

2. Fast minimization of function <1>; Equation ([T]) is a particular case of the inertial gradient-like system 

(2) :r(t)-I-a(t)i(t)-b V<i>(a;(t)) = 0, 

with asymptotic vanishing damping, studied by Cabot, Engler and Gadat in [24j [^. As shown in [24l Corollary 
3.1] (under some additional conditions on $), every solution x(-) of (l2|) satisfies limt_>+oo= min$, provided 
/y a{t)dt = -boo. The specific case (P) was studied by Su, Boyd and Candes in [38] in terms of the rate of convergence 
for the values. More precisely, [38l Theorem 4.1] establishes that 4>(a;(t)) — min$ = 0{t~‘^), whenever a > 3. 
Unfortunately, their analysis does not entail the convergence of the trajectory itself. 

3. Relationship with fast numerical optimization methods: As pointed out in [381 Section 2], for a = 3, (|T]) can 

be seen as a continuous version of the fast convergent method of Nesterov (see |29l|30l|3Tl [32] ), and its widely used 
successors, such as the Fast Rerative Shrinkage-Thresholding Algorithm (FISTA), studied in [19] . These methods have 
a convergence rate of ^{x^) — min 4) = where k is the number of iterations. As for the continuous-time system 

o, convergence of the sequences generated by FISTA and related methods has not been established so far. This is a 
central and long-standing question in the study of numerical optimization methods. 

The purpose of this research is to establish the convergence of the trajectories satisfying o, as well as the sequences 
generated by the corresponding numerical methods with Nesterov-type acceleration. We also complete the study with 
several stability properties concerning both the continuous-time system and the algorithms. 
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More precisely, the main contributions of this work are the following: In Section [21 we first establish the minimizing 
property in the general case where a > 0 and inf $ is not necessarily attained. As a consequence, every weak limit 
point of the trajectory must be a minimizer of $, and so, the existence of a bounded trajectory characterizes the 
existence of minimizers. Next, assuming argmin$ ^ 0 and a > 3, we recover the 0(t~^) convergence rates and give 
several examples and counterexamples concerning the optimality of these results. Next, we show that every solution 
of (HI) converges weakly to a minimizer of $ provided a > 3 and argmin$ 0. We rely on a Lyapunov analysis, 
which was first used by Alvarez [3] in the context of the heavy ball with friction. For the limiting case a = 3, which 
corresponds exactly to Nesterov’s method, the convergence of the tra! jectories is still a puzzling open question. 
We finish this section by providing an ergodic convergence result for the acceleration of the system in case Vd> is 
Lipschitz-continuous on sublevel sets of $. Strong convergence is established in various practical situations enjoying 
further geometric features, such as strong convexity, symmetry, or nonempty ness of the interior of the solution set (see 
Section [3]). In the strongly convex case, we obtained a surprising result: convergence of the values occurs at a rate of 
Section 0] contains the analogous results for the associated Nesterov-type algorithms (which also correspond 
to the case a > 3). As we were preparing the final version of this manuscript, we discovered the preprint [26] by 
Chambolle and Dossal, where the weak convergence result is obtained by a similar, but different, argument (see [26l 
Theorem 3]). 

2. Minimizing property, convergence rates and weak convergence of the trajectories 

We begin this section by providing some preliminary estimations concerning the global energy of the system o and 
the distance to the minimizers of $. These allow us to show the minimizing property of the trajectories under minimal 
assumptions. Next, we recover the convergence rates for the values originally given in [38] and obtain further decay 
estimates that ultimately imply the convergence of the solutions of O- We finish the study by proving an ergodic 
convergence result for the acceleration. Several examples and counterexamples are given throughout the section. 

2.1. Preliminary remarks and estimations. The existence of global solutions to o has been examined, for 
instance, in [241 Proposition 2.2.] in the case of a general asymptotic vanishing damping coefficient. In our setting, 
for any to > 0, a > 0, and (xo,vo) there exists a unique global solution x : [to,+oo [—H of (|T|), satisfying 

the initial condition x{to) = xq, i(fo) = vq, under the sole assumption that inf $ > —oo. Taking to > 0 comes from 
the singularity of the damping coefficient a(t) = j at zero. Indeed, since we are only concerned about the asymptotic 
behavior of the trajectories, we do not really care about the origin of time. If one insists in starting from to = 0, then 
all the results remain valid with a(t) = 

At different points, we shall use the global energy of the system, given by W : [to, -l-oo[—>• R 

(3) W{t) = ^\\xit)\\^ + ^x{t)). 

Using m, we immediately obtain 

Lemma 2.1. Let W be defined by ([3]). For each t > to, we have 

W{t) = -^\\x(t)r- 

Hence, W is nonincreasin^, and tUoo = hmt_j.+oo W{t) exists in RUj—oo}. If $ is bounded from below, W^c is finite. 
Now, given z gH, we define hz : [to, +oo[—R by 

(4) hz{t) = ^\\x{t)- z\\^. 

By the Chain Rule, we have 

hz{t) = {x{t) — z,x{t)) and 'hz{t) = (x(t) — z,x{t)) + \\x{t)\\'^. 

Using o. we obtain 

(5) hz{t) + jhzit) = ||i:(t)|P + {x{t) - z,x{t) + jx(t)) = ||i;(t)|P -f {x(t) - z,-V<^(x(t))). 

The convexity of $ implies 

{x(t) - z, V4>(x(t))) > $(x(t)) - $(z), 

and we deduce that 

(6) hz(t) + jhzit) + ^xit)) - 4>(z) < ||i:(t)f. 

We have the following relationship between and W : 


^In fact, W decreases strictly, as long as the trajectory is not stationary. 



ON THE FAST CONVERGENCE OF AN INERTIAL GRADIENT-LIKE DYNAMICS WITH VANISHING VISGOSITY 


3 


Lemma 2.2. Take z £%, and let W and be defined by © and m, respectively. There is a constant C such that 


'to 


- (VL(s) - ^{z))ds <C - - —W{t). 

s t 2a 


Proof. Divide © by t, and use the definition of W given in dS]), to obtain 

+ ^K{t) + \ {W{t) - d>(z)) < ■ 

Integrate this expression from to to t > to (use integration by parts for the first term), to obtain 

(7) [ - {W{s) - ^{z))ds < ^h;,{to)-^h^(t) - {a + 1) [ ^h;,{s)ds + [ ;^||i:(s)|pds. 

Jtn ^ to t S Jf. Zs 


On the one hand, Lemma O gives 


1^'^J\i(s)rds = T^W{to)-W{t)). 


On the other hand, another integration by parts yields 


ft I . 1 1 ft 2 

/ —hz{s)ds = —h^{t)-^h^{to)+ —hz{s)d. 
Jtn S t tn s 


1 


s P 2 {fof 


Combining these inequalities with we get 


1 


1 • 


1 


1 


1 • 


(bL(s) - ^{z))ds < —hz{to) - -hz{t) + {a + l)-^hz{to) + 7 ;-(bC(io) - W{t)) = C - -h^it) - —W{t), 


I to 


to 


2a 


2a 


where C collects the constant terms. 


□ 


2.2. Minimizing property. It turns out that the trajectories of o minimize $ in the completely general setting, 
where a > 0, argmin$ is possibly empty and $ is not necessarily bounded from below. This property was obtained 
by Alvarez in [31 Theorem 2.1] for the heavy ball with friction (where the damping is constant). Similar results can 
be found in [33] . 

We have the following: 

Theorem 2.3. Let a > 0 and suppose x : [to,+oo[— >■ H is a solution of ([1]). Then 

i) Woe = limt^+oo W{t) = limt^+oo ^ix{t)) = inf $ € MU {-oo}. 

ii) As t —> + 00 , every weak limit point of x(t) lies in argmin$. 

iii) //argminib = 0, then limt_>.+oo 11 (011 = + 00 . 

iv) If X is bounded, then argmin<I' ^ 0. 

v) //$ is bounded from below, then limt ^+00 11^(011 = 0- 

vi) If $ is bounded from below and x is bounded, then hmt_j.+oo (t) = 0 for each z € H. Moreover, 


1 

/ -($(a;(t)) — min$)dt < + 00 . 

Jto t 


Proof. To prove i), first set z G TL and t > t > to. By Lemma [3 t1 W in nonincreasing. Hence, Lemma [3?3] gives 

(W(r) - <l>(z)) + Tw{r) <C- h^it), 


which we rewrite as 


and then 


(W(t) - $(z)) 


*0 


'to 

ds 3 
s 2a 


<c-T^{z)-hz{t), 


(W(r) - d>(z)) (ln{t) + ^ - In(to)^ <C- - \ 'hz{t). 

= T to obtain 

{W{t) - $(z)) ^rln(T) -toln(<o) +to-T+ - In(to)^ (r - to)^ “ ~ “/ jhz{t)dt. 


Integrate from t = to to t = t to obtain 


But 


Hence, 


MO _ hzjr) _ hzjto) 


^00_^^ > _ltz(.to) 


'to 


'to 




to 


{W (t) — $(z))(t ln(T) + At + B) < Ct + D, 
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for suitable constants A, B, C and D. This immediately yields Woo < ‘l’(z), and hence Woo < inf $. It suffices to 
observe that 

inf $ < liminf ^{x{t)) < limsup 3>(a:(<)) < lim W{t) = Woo 

t ^-|-00 ^ ^_|_QQ t —^-|-00 

to obtain i). 

Next, ii) follows from i) by the weak lower-semicontinuity of Clearly, iii) and iv) are immediate consequences 
of ii). We obtain v) by using i) and the definition of W given in ([3]). For vi), since hz{t) = {x{t) — z,x{t)) and x is 
bounded, v) implies lim hz{t) = 0. Finally, using the definition of W together with Lemma |2.2I with z € argmin$, 

t—>- + CX) 

we get 

1 3 

/ -(il>(a;(t)) — min<I')(it < C-min$ <-|-oo, 

Jto t 2a 

which completes the proof. □ 

Remark 2.4. We shall see in Theorem 12.131 that, for a > 3, the existence of minimizers implies that every solution 
of H]) is bounded. This gives a converse to part iv) of Theorem 12.31 

If <I> is not bounded from below, it may be the case that ||x(t)|| does not tend to zero, as shown in the following 
example: 

Example 2.5. Let "H = K. and a > 0. The function x(t) = satisfies (H)) with 4>(a:) = —2 (q; -|- 1)x. Then 
limt^+oo = -oo = inf $, and limt^+oo ||i(i)|| = +oo. 

2.3. Two “anchored” energy functions. We begin by introducing two important auxiliary functions and showing 
their basic properties. From now on, we assume argmin4> ^ 0. Fix A>0, ^>0, p>0 and x* G argmin$. Let 
X : [to, +oo[—>■ "H be a solution of ([T]). For t > to define 

^A.c(t) = t^($(a;(t)) - min$) i||A(x(t) - x*) + tx{t)f + |||x(t) - x*\\'^, 

£l(t) = t^’£:A,o(i) = ^i^(^’(a:(t)) - min4>)-f i||A(a;(t) - x*)-f ti:(t)|p^ , 

and notice that and £^ are sums of nonnegative terms. These generalize the energy functions £ and £ introduced 
in [5S]. More precisely, £ = £a-i,o and £ = ^(^ 2 a- 3 )/ 3 - 

We need some preparatory calculations prior to differentiating £\^^ and £^. For simplicity of notation, we do not 
make the dependence of a: or cfc on t explicit. Notice that we use o in the second line to dispose of x. 

-^t^(<l)(x) — min$) = 2t(<i>(a:) — min^) + t^(i, Vd>(x)) 

at 

A(x — X*) -|- ti;||^ = — At(x — x*, V$(x)) — A(a — A — l)(x — x*,x) — (a — A — l)t||x||^ — t^(x, V$(x)) 

= {x-x*,x). 

Whence, we deduce 

(8) ^£\e{t) = 2t($(x) — mindJ) — At(x — X*, V<i>(x))-I-— A(a — A — l))(x — x*,x) — (a — A — l)t||x|P 

at 

(9) = (p + 2)t^''"^($(x) — min $) — At^''"^(x — X*, V<I>(x)) — A(a — A — 1 — p)t^(x — X*, x) 

+ ^tP-^\\x-x*f-[a-X-l-^) tP+^\\xf. 

Remark 2.6. If p G "H and x* G argmin4>, the convexity of $ gives mindi = $(x*) > ^{y) -I- (Vd)(?/),x* — y). Using 
this in ([H) with y = x{t), we obtain 

—£iA,^(0 < (2 — \)t (<I’(x) — min $) + (^ — A(q! — A — l))(x — x*, x) — (a — A — 1) t ||x|p. 

If one chooses = A(q! — A — 1), then 

(f) < (2- \)t ($(x) - min $) - (a - A - 1) f ||x||^. 

Therefore, if a > 3 and 2 < A < a — 1, then fA,{* is nonincreasing. The extreme cases A = 2 and A = a — 1 are of 
special importance, as we shall see shortly. 
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2.4. Rate of convergence for the values. We now recover convergence rate results for the value of $ along a 
trajectory, already established in [38l Theorem 4.1]: 

Theorem 2.7. Let x : [to,+oo[—>■ Ji he a solution of ([T]) and assume argmin$ ^0. If a >3, then 


^{x{t)) — min 4> < 




If a> then 


t(^{x{t)) — min$) dt < < +oo. 

0 — 3 


Proof. Suppose a > 3. Choose A = a — 1 and ^ = 0, so that £, — A(a —A — 1) = q; — A—1 = 0 and A — 2 = a — 3. 
Remark 12.61 gives 

d 


( 10 ) 


dt 


Sa-i,o(t) < —(a — 3) t ('i>(x) — min $), 


and 6a-i,o is nonincreasing. Since t^($(a;) — min$) < £a-ifi{t), we obtain 


^{x{t)) — min <i> < 


£a-i,o{to) 

■ 


If a > 3, integrating (Unj from to to t we obtain 


[ s($(x(s)) - min$)ds < (ga-l,o(^o) - £a-l,oit)) < - 

Jto « - 3 a - 3 

which allows us to conclude. 

Remark 2.8. It would be interesting to know whether a = 3 is critical for the convergence rate given above. 


□ 


Remark 2.9. For the (first-order) steepest descent dynamical system, the typical rate of convergence is 0(1/1) (see, 
for instance, [311 Section 3.1]). For the second-order system (|T]), we have obtained a rate of 0(l/t^). It would be 
interesting to know whether higher-order systems give the corresponding rates of convergence. Another challenging 
question is the convergence rate of the trajectories defined by differential equations involving fractional time derivatives, 
as well as integro-differential equations. 

2.5. Some examples and counterexamples. A convergence rate of 0{l/t^) may be attained, even if argmin$ = 0 
and a < 3. This is illustrated in the following example: 

Example 2.10. Let H =M. and take $(x) = with a > 1. Let us verify that x{t) = Int is a solution of (|T]). 

On the one hand, 

.. / N Q; . , , a — 1 

x{t) + -x{t) = 

On the other hand, V<l’(a:) = —{a — l)e“^“ which gives 

V$(x(t)) =-(a-l)e-2'''‘ 


a — 1 


Thus, x{f) = Int is a solution of (HI). Let us examine the minimizing property. We have inf <I> = 0, and 


<^{x{t)) = 


0—1 


,-21ni 


0—1 


Therefore, one may wonder whether the rapid convergence of the values is true in general. The following example 
shows that this is not the case: 

Example 2.11. Let 'H = R and take $(a;) = ^ , with 9 > 0 , a > j 2 ^ c = ■ Let us verify that 

x(t) = t^ is a solution of ([T]). On the one hand, 

Q! 2 2 (i+e) 

x{t) + -x{t) = - —^(2a + 6{a - 1))^ W. 


t 


(2 + 0)2 


On the other hand, V<i)(a:) = —cOx ® ^ which gives 

2(i+e) 2 2(i+e) 

V^{x{t)) =-cOt =-j^^^{2a + 9{a-l))t ^+3“. 

Thus, x{t) = t^ is solution of (|T]). Let us examine the minimizing property. We have inf 4> = 0, and 


$(x(t)) = c 


1 


2e 1 
tTot 


with 


20 


< 2 . 
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We conclude that the order of convergence may be strictly slower than when argmin$ = 0. In the Example 

12.111 this occurs no matter how large a is. The speed of convergence of <l>(a;(t)) to inf $ depends on the behavior of 
<I)(a;) as ||x|| —>■ +oo. The above examples suggest that, when $(x) decreases rapidly and attains its infimal value as 
||x|| —>■ oo, we can expect fast convergence of <I)(a;(t)). 

Even when argmindi ^ 0, is the worst possible case for the rate of convergence, attained as a limit in the 

following example: 

Example 2.12. Take 'H = R and $(a;) = c|a:|’^, where c and 7 are positive parameters. Let us look for nonnegative 
solutions of (HD of the form x{t) = with 0 > 0. This means that the trajectory is not oscillating, it is a completely 
damped trajectory. We begin by determining the values of c, 7 and 6 that provide such solutions. On the one hand, 

x(t) + ^x{t) =e{e + l- a)p^. 

On the other hand, V<i)(a;) = C 7 |a;p“^x, which gives 


V$(x(t)) = C 7 ^^pp^. 

Thus, x{t) = -^ is solution of ([T|) if, and only if, 

i) 9 + 2 = 0(7 — 1), which is equivalent to 7 > 2 and 9 = 7 ^; and 


ii) C 7 = 9{a — 9—1), which is equivalent to a > 7 ^ and c = — 7 ^)- 

We have min $ = 0 and 

2 


$(a:(<)) = 


(a- 


7 


i)' 


1 


7 ( 7 - 2 )''" 7-2'^^ 


The speed of convergence of $(a:(t)) to 0 depends on the parameter 7 . As 7 tends to infinity, the exponent 
tends to 2. This limiting situation is obtained by taking a function $ that becomes very flat around the set of its 
minimizers. Therefore, without other geometric assumptions on $, we cannot expect a convergence rate better than 
0{l/t^). By contrast, in Section|31 we will show better rates of convergence under some geometrical assumptions, like 
strong convexity of $. 

2.6. Weak convergence of the trajectories. In this subsection, we show the convergence of the solutions of (HD, 
provided a > 3. We begin by establishing some preliminary estimations that cannot be derived from the analysis 
carried out in [38) . The first statement improves part v) of Theorem 12.31 while the second one is the key to proving 
the convergence of the trajectories of HD: 

Theorem 2.13. Let x : [fo,+oo[— >■ TL be a solution of ((T|) with argmindi ^ 0. 
i) If C(> 5 and x is bounded, then ||a;(t)|| = 0(1/1). More precisely, 


( 11 ) 


||a:(i)|| < ^ (\/2£a-i,o{to) + {a - 1 ) sup ||a:(t) - . 


t>to 


ii) If a > 3, then x is bounded and 


( 12 ) 


L 


2 ^ ^2.2(a-3)(io) 


t||i:(t)|E dt < 


a — 3 


< + 00 . 


Proof. To prove i), assume a > 3 and x is bounded. From the definition of £\,^, we have 5 II A(a; — x*)+tx\\'^ < £x^^{t), 
and so ||ti;|| < x/2£\^^[t) + A||x — x*||. By Remark 12.61 £a-i,o is nonincreasing, and we immediately obtain HD- 
In order to show ii), suppose now that a > 3. Choose A = 2 and = 2(a — 3). By Remark [2j6l we have 


(13) 


j^£x..v{t)<-{a-3)t\\x\\\ 


and is nonincreasing. From the definition of £x,^, we deduce that ||x(t) — < j£x,^{t), which gives 


(14) 


\x{t)-x*f< 


^2,2(ct-3)(0 ^ ^2,2(a-3)(^o) 


a — 3 0 — 3 

and establishes de boundedness of x. Integrating HI from to to t, and recalling that £x,^* is nonnegative, we obtain 


3||i:(s)||^ds < 


^2.2(a-3) 


-3)(^o) 


a — 3 


as required. 


□ 
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Remark 2.14. In view of (fTTl) and (1141) . when a > 3, we obtain the following explicit bound for ||i||, namely 

PWlI < ] + {a- 

Since limt_j.-|_oo ||a;(t)|| = 0 by Theorem 12.31 we also have lim 4 _>-|_oo t ||i(t)|p = 0. 

We are now in a position to prove the weak convergence of the trajectories of o, which is the main result of this 
section: 



Theorem 2.15. Let $ : "H — >■ M fee a continuously differentiable convex function. Let argmin<i> 0 and let x : 
[to, +oo[—> % he a solution of ([l]) with a > 3. Then x{t) converges weakly, as t ^ +cxi, to a point in argmind*. 

Proof. We shall use Opial’s Lemma 16.21 To this end, let x* € argmin $ and recall from (|6]) that 

hx*{t) + + ^ix{t)) - min4> < ||i(t)|p, 

where h^ is given by O- This yields 

thx* (t) + ahx- (t) < t||i(t)||^. 

In view of Theorem l2.I31 part ii), the right-hand side is integrable on [to, +oo[. Lemma [6.3l then implies limi_>+oo h^* (t) 
exists. This gives the first hypothesis in Opial’s Lemma. The second one was established in part ii) of Theorem l2.3l □ 


Remark 2.16. A puzzling question concerns the convergence of the trajectories for a = 3, a question which is directly 
related to the convergence of the sequences generated by Nesterov’s method. 

2.7. Further stabilization results. Let us complement the study of equation o by examining the asymptotic 
behavior of the acceleration x. To this end, we shall use an additional regularity assumption on the gradient of <i>. 


Proposition 2.17. Let a > 3 and let x : [to,+oo[H> TL he a solution of ([T]) with argmin$ ^ 0. Assume V$ 
Lipschitz-continuous on bounded sets. Then x is bounded, globally Lipschitz continuous on [to,+oo[, and satisfies 

s“||:r(s)|pc?s = 0. 


1 


lim — 

>-+oo 


to 


Proof. First recall that x and x are bounded, by virtue of Theorems 12.131 and 12.31 respectively. By ([IJ, we have 

(15) x{t) =-jx{t)-W<^{x{t)). 

Since V$ is Lipschitz-continuous on bounded sets, it follows from (fT5|) . and the boundedness of x and x, that x is 
bounded on [to,+oo[. As a consequence, x is Lipschitz-continuous on [to,+oo[. Returning to (TTKll . we deduce that x 
is Lipschitz-continuous on [to,+oo[. 

Pick X* G argmin <!>, set h = hx- (to simplify the notation) and use (O to obtain 

(16) h{t) + %{t) + {x{t) - X*, V$(x(t))) = ||i(t)|[2. 

Let L be a Lipschitz constant for V<I> on some ball containing the minimizer x* and the trajectory x. By virtue of 
the Baillon-Haddad Theorem (see, for instance, [T7], [3S1 Theorem 3.13) or [Ml Theorem 2.1.5]), V$ is -^-cocoercive 
on that ball, which means that 

(x(t) - X*, V$(x(t)) - V$(x*)) > y II V$(x(t)) - V$(x*)||2. 

Jj 

Substituting this inequality in (fM|) . and using the fact that V<I>(x*) = 0, we obtain 


Mt) + ^A(t)-fi||v$(x(t))f <||x(t)f. 

In view of (ITSl) . this gives 

h{t) + jh{t) + ^\\x{t) + jx{t)\\^ < ||x(t)||2. 

Developing the square on the left-hand side, and neglecting the nonnegative term {a\\x{t)\\/t)‘^/L, we obtain 


•• ry • 1 cv (I 

h{t) + -h{t) + -wmr + • 


We multiply this inequality by to obtain 

^ [rkit)) P ir||x(t)||2 + A||x(t)f < t“||x(t)||^ 

Integration from to to t yields 


h{f)-tQh(tf) + j f s“||x(s)|pfis-f y ^||x(t)||^-to“ ^l|i(to)||^-(a-l) f l|i(s)ll^s“ ‘^ds^ < j s“||x( 

Jto ^ to to 


l^ds. 
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Neglecting the nonnegative term ^||a;(t)|p/L, we obtain 

(17) + Y f s“||i(s)|pds < C + (a — 1) f ||i;(s)||^s““^(is + f s“||i;(s)|pds, 

^ Jto Jto Jta 

where C = to/i(to) + 

If to < 1, we have 

1 

t“ 


^ s“||x(s)fds = s“||i(s)fds +s“||i(s)fds 


for all t > 1. Since the first term on the right-hand side tends to 0 as t —> -|-oo, we may assume, without loss of 
generality, that to > 1- 

Observe now that s““^ < s“, whenever s > 1. Whence, inequality (flTll simplifies to 

1 


+-j- J s“||ai(s)|pc?s < C-I-a j s“||i:(s)|pds. 


'to 


C 


a — 1 






Dividing by t“ and integrating again, we obtain 
h{t) - h{to) + ^ 

Setting C' = h{to) + CtQ°‘'^^/(a—l), and neglecting the nonnegative term ti(t) of the left-hand side and the nonpositive 
term —C't““+^/(a — 1) of the right-hand side, we get 

rf f / s“||ai(s)fds")dT<C7' + a / t-“ f / s“||i(s)f ds) dr. 

^ Jto \JtQ J Jto \Jto / 

Set gij) = T~°‘ s“||x(s)|pds^ and use Fubini’s Theorem on the second integral to get 

yf g(,T)dT<C'-\ -s“||i;(s)||^(s““+^ - ds < C"-f —^ / s||i;(s)|pds. 

L Jto a - 1 dto a-1 

By part ii) of Theorem 12.131 the integral on the right-hand side is finite. We have 

/*+oo 

(18) 


The derivative of g is 


^+oo 

/ g{j)dT < -1-00. 

Jto 

g{T) =—aT~°‘~^ f s“||i(s)|pds-I-||d(7 
Jto 


Let C" be an upper bound for ||ai|p. We have 


(19) \g{T)\ < C" { 1 + ar 




= C" 1 + 


a -I- 1 


q : —1 ( _ot+l ±Oi-\-V 


- t“+i) < C" 1 


a -|- 1 


From (ITSl) and (fT^ we deduce that limT—>.+oo ^(t) =0 by virtue of Lemma ISTI 

Remark 2.18. Since s°‘ds = — to"''^), Proposition 12. 1 71 expresses a fast ergodic convergence of ||i(s) 

to 0 with respect to the weight s“ as t —>■ -l-oo, namely 


□ 


/I 

— -7^- = o 

fto 


3. Strong convergence results 

A counterexample due to Baillon [16] shows that the trajectories of the steepest descent dynamical system may 
converge weakly but not strongly. Nevertheless, under some additional geometrical or topological assumptions on $, 
the steepest descent trajectories do converge strongly. This has been proved in the case where the function $ is either 
even or strongly convex (see 122]), or when int(argmin$) ^ 0 (see [201 theorem 3.13]). Some of these results have 
been extended to inertial dynamics, see [3] for the heavy ball with friction, and [B] for an inertial version of Newton’s 
method. This suggests that convexity alone may not be sufficient for the trajectories of m to converge strongly, 
but one can reasonably expect it to be the case under some additional conditions. The purpose of this section is 
to establish this fact. The different types of hypotheses will be studied in independent subsections s! ince different 
techniques are required. 
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3.1. Set of minimizers with nonempty interior. Let us begin by studying the case where int(argniin $) ^ 0. 

Theorem 3.1. Let $ : "H —>■ K. &e a continuously differentiable convex function. Let int(argmin$) 0 and let 
X : [to,+oo[—> 'H he a solution of ([1]) with a > 3. Then x{t) converges strongly, as t ^ +oo, to a point in argmin<i>. 
Moreover, 


/ 

• 1*0 


t||V$(a;(t))||dt < +c». 


Proof. Since int(argmin$) ^ 0, there exist x* € argmin$ and some p > 0 such that V<I>(z) = 0 for all z £ TL such 
that \\z — a;*|| < p. By the monotonicity of V$, for all y gTL, we have 

{'^^{y),y- z) > 0. 

Hence, 

(V$(y), 2 /-x*) > (V$( 2 /),z-x*). 

Taking the supremum with respect to z gTL such that \\z — a;*|| < p, we infer that 

(V$(y),2/-a;*) >p||V$(2/)|| 

for all y £%. In particular, 

{V^{x{t)),x(t) -X*) > p||V$(x(t))||. 

By using this inequality in ([8|) with A = a — 1 and ^ = 0, we obtain 

^£a-ip{t) + (a - l)pt\\V^{x(t))\\ < 2t(d>(x(t)) - min$), 
whence we derive, by integrating from to to t 

£a-i,oit) - £a-i,o{to) + (a - l)p f s||Vd>(a;(s))||ds < 2 f s(d>(a;(s)) - min$) ds. 

Jtn J tn 


Since £a-i,o{t) is nonnegative, part ii) of Theorem 12.71 gives 


(20) / t||V<i>(a;(t))||d< < +oo. 

Jto 

Finally, rewrite o as 

tx{t) + ax{f) = —tVd>(a;(t)). 

Since the right-hand side is integrable, we conclude by applying Lemma 16.41 and Theorem l2.151 


□ 


3.2. Even functions. Let us recall that <i) : "H —>■ K is even if <!)(—a;) = <I>(a:) for every x £ TL. In this case the set 
argmin $ is nonempty and contains the origin. 

Theorem 3.2. Let $ : "H —>■ M 6e a continuously differentiable convex even function and let x : [to,+oo[— TL be a 
solution of o with a > 3. Then x{t) converges strongly, as t ^ -boo, to a point in argmin$. 


Proof. For to < ^ < .s, set 

q{T) = ||a:(r)f - ||a;(s)f - i||a;(T) - a:(s)f. 

We have 

q{T) = {x{t),x{t) + x{s)) and g(r) = ||i;(T)||^-b (i(r), a;(T) + a;(s)). 
Combining these two equalities and using o, we obtain 

(21) q{T) + -qfr) = ||i(r)f -b {x{t) -b -a;(T), x{t) + x{s)) = ||a;(r)||^ - (V$(a;(T)), x{t) -b a;(s)). 

T T 

Recall that the energy W{t) = 5l|a;(r)|p + <l)(a;(T)) is nonincreasing. Therefore, 
i|li;(r)f -b$(x(T)) > i||i;(s)f -b$(x(s)) 

= ^||i(s)f+ $(-x(s)) 

> ^l|i('S)P + ^(a;(r)) - (V$(a;(T)), a:(r) -b x{s)), 

by convexity. After simplification, we obtain 

(22) ^P('r)f > -(V$(x(r)),a;(T) -bx(s)). 

Combining (l2T]l and (l2^ . we obtain 

Tg(r) -bag(r) < ^r||a;(T)f. 
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As in the proof of Lemma 16.31 we have 

q{T) < kir) + ^ f u°‘\\x{u)\\'^du, 

J to 

where C = 2||i;(to)|| ||a;||oo- The function k does not depend on s. Moreover, using Fubini’s Theorem, we deduce that 

p-\-oo Q 0 /*+CJO 

/ KT)dT<—^ -TT + T7-TT / u\\x{u)\\^ du <+oo, 

Jto to (a-1) 2(0-1)^ 

by part ii) of Theorem 12.131 Integrating g(r) < ^(t) from t to s, we obtain 


-||x(t)-x(s)f < ||a;(t)f - ||a;(s)f + k{T)dT. 

Since is even, we have 0 £ argmin<i>. Hence limt_s.+oo ||2^(i)lP exists (see the proof of Theorem I2.15|) . As a 
consequence, x(t) has the Cauchy property as t ^ +oo, and hence converges. □ 


3.3. Uniformly convex fnnctions. Following [TB], a function $ ; 'H —R is uniformly convex on bounded sets if, for 
each r > 0, there is an increasing function : [0, +oo[—>■ [0, +oo[ vanishing only at 0, and such that 

(23) $(i/) > $(x) + {V‘^{x),y - x) + ujr[\\x - i/||) 

for all x,y such that ||a:|| < r and ||y|| < r. Uniformly convex functions are strictly convex and coercive. 

Theorem 3.3. Let <i> be uniformly convex on bounded sets, and let x : % be a solution of (Hj) with a > 3. 

Then xit) converges strongly, as t ^ +oo, to the unique x* £ argmin$. 


Proof. Recall that the trajectory x{-) is bounded, by part ii) in Theorem 12.131 Let r > 0 be such that x is contained 
in the ball of radius r centered at the origin. This ball also contains x* , which is the weak limit of the trajectory in 
view of the weak lower-semicontinuity of the norm and Theorem l2.15l Writing y = x(t) and a; = a:* in (1231) . we obtain 


a;,.(||a:(t) — a;*||) < ^{x{t)) — min$. 

The right-hand side tends to 0 as t —>■ -l-oo by virtue of Theorem 12.31 It follows that x{t) converges strongly to x* as 
t — y -t-oo. fH 


Let us recall that a function <I> : 'H —^ K is strongly convex if there exists a positive constant y, such that 

$(y) > $(a:) + (V$(x), y - a;) -k ^||a; - yf 

for all a;,y £ "H. Clearly, strongly convex functions are uniformly convex on bounded sets. However, a striking fact is 
that convergence rates increase indefinitely with larger values of a for these functions. 


Theorem 3.4. Let $ : 'H —>■ R 6 e strongly convex, and let x : [to,+oo[~>' TL he a solution of ([T]) with a > 3. Then 
x(t) converges strongly, as t ^ -l-oo, to the unique element x* £ argmin$. Moreover 


(24) 


$(a;(t)) - min$ = O , ||a;(t) - x*f = O , 


and 


= o 




Proof. Strong convergence follows from Theorem 13.31 because strongly convex functions are uniformly convex on 
bounded sets. From © and the strong convexity of $, we deduce that 

— (p +2 — A)t^'''^($(a;) — min$) — A(q; — A — 1 — y)t^(a; — x*,a;) 

A 


-{yf-pX)tP ^\\x-x*f- {a-\-l-^tP+^\\xf 


for any A > 0 and any p > 0. Now fix p = |(a — 3) and A = |a, so that p-|-2 — A = a — A—I — p/2 = 0 and 
a — A—I—p = —p/2. The above inequality becomes 

< ^tP{x-x*,x) - ^{yt^ - pXy-^Wx - x*f. 


Define ti = max 


^^aW < ^tP{x-x*,x) 


for all t > ti. Integrate this inequality from ti to t (use integration by parts on the right-hand side) to get 
sm < Slih) + ^ (t^||x(t) - xip - t^Mh) - -pj' s^-^x{s) - x*fds^ . 


Hence, 

(25) 


Slit) < Slih) + ^tP\\xit) - x*f < Slih) -k ^tP^xit)) - min$), 
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in view of the strong convexity of <&. By the definition of we have 

tP+^{{^{x{t)) - min$) < £l{t) < £P{ti) + ^tP{^{x{t)) - min$). 
Dividing by and using the definition of ti, along with the fact that t > ti, we obtain 

<i>(x(t)) — min<i> < £P{ti)t~P~‘^ + ^t~^(^{x{t)) — 

< £P{ti)t-P-^+ h^{x{t)) 


Recalling that p = |(a — 3) and A = |a, we deduce that 

(26) $(x(t)) — min$ < 2£P{ti)t~P~^ = 

The strong convexity of $ then gives 


ml" "\h) 

3“ 




(27) 


||x(i) — < 


-siih) 

L/i 


^-p-2 ^ 


P 3“ 


t“3“. 


Inequalities (1^51) and (071) settle the first two points in (071) . 

Now, using (051) and (051) . we derive 

£l{t) < £l{ti) + ^tP{^{x{t)) - min$) < £P^{ti) + —£P^{h)m < £P^{ti) + —£P^{ti)t^‘^ < 2£P^{ti). 

Zijjj yL£ yL£ 

The dehnition of £^ then gives 


Hence 

and 

But using (l27l) . we deduce that 


*—\\X{x{t) - X*) + tx{t)\\^ < £P^{t) < 2 £P{t). 


||A(a:(<) - a:*) + ta;(t)||^ < 4t P£l{ti), 
t\\x{t)\\ < 2t-P/^^j£P(ti) + A||x(t) - a;*||. 


The last two inequalities together give 
t||i;(t)|| 

Taking squares, and rearranging the terms, we obtain 


\x(t) - x*\\ < —t ^^£P^(ti). 

< (l + ^) < (l + 


IliMIl" < 


4 1 + 


a — 3 


41" "\ti) 


t-3^ 


which shows the last point in (|24)) and completes the proof. 


□ 


The preceding theorem extends [351 Theorem 4.2], which states that if a > 9/2, then 4>(a:(t)) — min4> = 0{l/t^). 


4. Convergence of the associated algorithms 

In many situations, one is faced with a non-smooth convex minimization problems with an additive structure of 
the form 

(28) min{<i>(a:) + il'(x) : x € V.} , 

where <i> : 77 —> M U {+oo} is proper, lower-semicontinuous and convex, and di : 77 —^ K is convex and continuously 
differentiable. 

Following the analysis carried out in the previous sections, it seems reasonable to consider the differential inclusion 

(29) x{t) + y^(0 + 94 )(x( 7)) + Vil'(a:(7)) 9 0, 

in order to approximate optimal solutions for (1281) . This differential inclusion is a special instance of 

x(t) + a(t)x{t) + dQ{x{t)) 9 0, 

where 0 : 77 —^ RU{+c»} is a convex lower-semicontinuous proper function, and a(-) is a positive damping parameter. 
This differential inclusion has been studied in nni in the case of a fixed positive damping parameter a(t) = 7 > 0. In 
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that setting, and at least localy, each trajectory is Lipschitz continuous, its velocity has bounded variation, and its 
acceleration is a bounded vectorial measure. 

Thus, setting 0(x) = $(a;)+'l>(a;), we can reasonably expect that the rapid convergence properties, studied in Section 
[2] for the solutions of o, should hold for the solutions of (|29|l as well. Especially, that 0(x(t)) — min 0 = 0(l/t^), and 
that each trajectory converges to an optimal solution. A detailed analysis is an interesting topic for further research, 
but goes beyond the scope of this paper. 

Using these ideas as a guideline, we shall introduce corresponding fast converging algorithms, making the link with 
Nesterov [29]-[32], Beck-Teboulle [T9], and the recent works of Chambolle-Dossal [2^, and Su-Boyd-Candes [38]. More 
precisely, it is possible to discretize (l29l) implicitely with respect to the nonsmooth function $, and explicitly with 
respect to the smooth function 4'. Indeed, taking a time step size h > 0, and tk = kh, Xk = x{tk) the classical finite 
difference scheme for (l29ll gives 

1 o; 

(30) —{xk+i - 2xk + Xk-i) + - Xk-i) + d^{xk+i) + V^'(yfc) 9 0, 

where yk is a linear combination of Xk and Xk-i, that will be made precise later. After developing (1301) . we obtain 

(31) Xk+i + h^d^{xk+i) 3 Xk + (l - (xk - Xk-i) - h^V^'(?/fc). 


A natural choice for yk leading to a simple formulation of the algorithm is 
(32) yk = Xk + (l - {xk - Xk-i). 


Of course, other choices are possible, an interesting topic for further research. Using the classical proximity operator 


prox..y$(x) = argmin^ 


{■5(0 +Till 



{I + 'yd<i>) ^(x), 


and setting 7 = /i^, the algorithm can be written as 


(33) 


yk = Xk + {1 - f) (xfe - Xfc-i); 


Xk+i = prox..y$ {yk - jV'Siyk)) ■ 


For practical purposes, and in order to fit with the existing literature on the subject, it is convenient to work with the 
following equivalent formulation 


(34) 


yk=Xk+ k+a-l i^k - Xk-l); 


Xk +1 = prox..^^ {yk - 'yV'S{yk)) ■ 


This algorithm is within the scope of the proximal-based inertial algorithms i, [28], m and forward-backward 
methods. It has been recently introduced by Chambolle-Dossal [26], and Su-Boyd-Candes [38]. For a = 3, we recover 
the classical algorithm based on Nesterov and Giiler ideas, and developed by Beck-Teboulle (FISTA) 


(35) 


yk = Xk + ^{xk - Xk-i); 

Xk+i = prox^2$ {yk - h'^V'^{yk)) ■ 


The fast convergence properties of the algorithm (l34l) were recently highlighted by Su-Boyd-Candes [38] and Chambolle- 
Dossal [26]. An important — and still open — question regarding the FISTA method, as described in (1351) . is the 
convergence of sequences {xk) and {yk)- The main interest of considering the broader context given in (I34p is that, 
for a > 3, these sequences converge. This has recently been obtained by Chambolle-Dossal [26]. Following [38], we 
will see that the proof of the convergence properties of (IM)) can be obtained in a parallel way with the convergence 
analysis in the continuous case. 


More precisely, following the arguments in the preceding sections, one is able to prove the following: 

Theorem 4.1. If a > 0, then lim 4>(xfe) = inf $ and every weak limit point of {xk), as k ^ -l-c», belongs to argmin<h. 
For argmind? 7 ^ 0, we have the following: 

i) If 3, then ^{xk) — min$ = 0{l/k'^) and ||xfe+i — Xk\\ = 0{l/k). 

ii) If a > 3, then g k{^{xk) — min$) < -foo, ^A:||xfc+i — XkW^ < -boo, and Xk converges weakly, as k ^ -boo, 
to some X* S argmin$. Strong convergence holds if ^ is even, uniformly convex, or i/argmin$ 7 b 0. 
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5. Further remarks 

5.1. Nonsmooth objective function. As mentioned at the beginning of Section^ it is interesting to establish the 
asymptotic properties, as t —>■ +oo, of the solutions of the differential inclusion (l29l) . Beyond global existence issues, 
one must check that the Lyapunov analysis is still valid. In view of the validity of the subdifferential inequality for 
convex functions, the (generalized) chain rule for derivatives over curves (see [20jl. our conjecture is that most results 
presented here can be transposed to this more general context, except for the stabilization of the acceleration, which 
relies on the Lipschitz character of the gradient. 

5.2. Time reparameterization. Let us examine briefly the effect of some simple rescaling procedures: 

Invariance. The condition a > 3 is not affected by an affine time rescaling. Indeed, if a > 0 and we take t = as in 
o, we obtain 

vis) + -yis) + a^V$(?/(s)) = 0, 
s 

where y{s) = x{as). This produces an analogue system with $ replaced by $o := a^$. 

Variable damping versus variable mass. If we take t = ^/s in ([T|) we obtain 

2sz(s) + (2a + I)i(s) + V<i)(z(s)) = 0, 

where z(s) = x{y/s). In this alternative formulation, the viscous damping parameter is fixed, but the mass coefficient 
becomes infinitely large as t —>■ +oo. This suggests that a parallel analysis can be performed by controlling the mass 
coefficient, instead of the viscosity coefficient. 

5.3. Selection of the initial conditions. The constant in the order of convergence given by Theorem 12.71 is 

K{xo,vo) = £a-i,o{to) = - min$) + i||(a- l)(xo - x*) + toVo\\'^, 

where xq = x{to) and vq = x{to). This quantity is minimized when Xq S argmin$ and Vq = (x* — Xq), with 

minAT = 0. If Xq ^ x*, the trajectory will not be stationary, but the value ^{x{t)) will be constantly equal to 0. 
Of course, selecting xjj € argmin<i) is not realistic, and the point x* is unknown. Keeping xq fixed, the function 
Vo I—>■ K(xq,vo) is minimized at vq = — Xq). This suggests taking the initial velocity as a multiple of an 

approximation of x* — xq, such as the gradient direction vq = V<i>(xo), Newton or Levenberg-Marquardt direction 
Vo = [el + V^$(xo)~^]V$(xo) (e^ > 0 ), or the proximal point direction vo = [(/ + 7 V$)“^(xo) — xq] (7 >> 0 ). 

5.4. Continuous versus discrete. The analysis carried out in Section 0] for inertial forward-backward algorithm 
(IMll is a reinterpretation of the proof of the corresponding results in the continuous case. In other words, we built a 
complete proof having the continuous setting as a guideline. It would be interesting to know whether the results in 
[a [8] can be applied in order to deduce the asymptotic properties without repeating the proofs. 

5.5. Hessian-driven damping. In the dynamical system studied here, second-order information with respect to time 
ultimately induces fast convergence properties. On the other hand, in Newton-type methods, second-order information 
in space, has a similar consequence. In a forthcoming paper, we analyze the solutions of the second-order evolution 
equation 

x(t) -I- + P V^<i)(x(t)) x(t) -(- V$(x(<)) = 0, 

where $ is a smooth convex function, and a, (3 are positive parameters. This inertial system combines an isotropic 
viscous damping which vanishes asymptotically, and a geometrical Hessian-driven damping, which makes it naturally 
related to Newton and Levenberg-Marquardt methods. 

6 . Appendix: Some auxiliary results 

In this section, we present some auxiliary lemmas to be used later on. The following result can be found in [1]: 

Lemma 6.1. Let (5>0, I<p<oo and 1 < x < 00 . Suppose F G LP([(5, oo[) is a locally absolutely continuous 
nonnegative function, G G L''([(5, oo[) and 

j^F{t) < G{t) 

for almost every t > 6. Then limt_>.oo F(t) = 0. 

To establish the weak convergence of the solutions of o, we will use Opial’s Lemma [33], that we recall in its 
continuous form. This argument was first used in |22j to establish the convergence of nonlinear contraction semigroups. 

Lemma 6.2. Let S be a nonempty subset of % and let x : [0,-|-oo) —>■ %. Assume that 

(i) for every z G S, limt^oo ||2:(t) — exists; 

(ii) every weak sequential limit point of x(t), as t ^ 00 , belongs to S. 

Then x(t) converges weakly as t ^ 00 to a point in S. 
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The following allows us to establish the existence of a limit for a real-valued function, as t —?► -l-oo: 

Lemma 6.3. Let d > 0, and let w : [i5, -|-oo[—> R 6e a eontinuously differentiable funetion whieh is bounded from below. 
Assume 


(36) 


tw[f) + aw{t) < g(t), 


for some a > 1, almost every t > S, and some nonnegative function g € L^{S, -too). Then, the positive part [w]+ of w 
belongs to L^{to,+oo) and \imt^+ao w{t) exists. 

Proof. Multiply ([Ml) by to obtain 
By integration, we obtain 






Hence, 


and so. 


[w]+{t) < 




f-Oi -fOL 


^g{s)ds 


Applying Fubini’s Theorem, we deduce that 


As a consequence, 


J, s(.)*<+co. 


Finally, the function 9 : [i5, -l-oo) —>■ R, defined by 

0{t) = w{t) - j [w]+{T)dT, 

Js 

is nonincreasing and bounded from below. It follows that 

p + CO 

lim wit) = lim 9(t)-\- / [w]-|_(T)dr 

t —^~)~oo t —^~l“Oo oS 

exists. 

The following is a vector-valued version of Lemma 16.31 
Lemma 6.4. Take 5 > 0, and let F € L^(S, -l-oo; T-L) be continuous. Let x : [<5, -|-oo[—>■ % be a solution of 
(37) tx{t) + ax{t) = F{t) 

with a > 1. Then, x(t) converges strongly inTL as t ^ -l-oo. 

Proof. As in the proof of Lemma 15751 multiply (1571) by and integrate to obtain 

i{t)=^-^ + ^j\--^F{s)ds. 

Integrate again to deduce that 


Fubini’s Theorem applied to the last integral gives 


(38) 


x(t) = x(S) + 


x{t) = x{S)+S°‘x{5) -^ds + J^ ^ 


S°‘x(S) / I 1 


-1 


□ 


Finally, apply Lemma [6.51 to the last integral with ijj{s) = s“ ^ and f{s) = ||F(s)|| to conclude that all the terms in 
the right-hand side of (1551) have a limit as t ^ -|-oo. □ 

The following is a continuous version of Kronecker’s Theorem for series (see, for example, [571 page 129]): 














ON THE FAST CONVERGENCE OF AN INERTIAL GRADIENT-LIKE DYNAMICS WITH VANISHING VISGOSITY 


15 


Lemma 6.5. Take (5 > 0, and let f € L^{S, +oo) be nonnegative and continuous. Consider a nondecreasing function 
if : ((5, +oo) —>■ (0, +oo) such that lim ip{t) = +00. Then, 

t^ + OO 


lim ,, , 
t->-+oo ifft) 



Proof. Given e > 0, fix sufficiently large so that 


pCO 

/ f{s)ds < e. 


Then, for t > te, split the integral Jg ip(s)f(s)ds into two parts to obtain 


1 


i’ii) Js 

Now let t —>■ +00 to deduce that 


if{s)f{s)ds =-^ if{s)f{s)ds +-^ if{s)f{s)ds < -^ if{s)f{s)ds + J^f{s)ds. 


1 


0 <limsup-—^ / ip{s)f(s)ds<e. 

t-y+oo ip[t) Js 

Since this is true for any e > 0, the result follows. 


□ 
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